Skip to content

[RISCV] add load/store misched/PostRA subtarget features #149409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 6, 2025

Conversation

danielhb
Copy link
Contributor

@danielhb danielhb commented Jul 17, 2025

Some processors benefit more from store clustering than load clustering,
and vice-versa, depending on factors that are exclusive to each one
(e.g. macrofusions implemented).

Likewise, certain optimizations benefits more from misched clustering
than postRA clustering. Macrofusions are again an example: in a
processor with store pair macrofusions, like the veyron-v1, it is
observed that misched clustering increases the amount of macrofusions
more than postRA clustering. This of course isn't necessarily true for
other processors, but it shows that processors can benefit from a more
fine grained control of clustering mutations, and each one is able to do
it differently.

Add 4 new subtarget features that deprecates the existing
riscv-misched-load-store-clustering and riscv-postmisched-load-store-clustering
options:

  • disable-misched-load-clustering and disable-misched-store-clustering:
    disable load/store clustering during misched;

  • disable-postmisched-load-clustering and disable-postmisched-store-clustering:
    disable load/store clustering during PostRA.

Note that the new subtarget features disables specific stages of the default
clustering settings. The default per se (load and store clustering for both
misched and PostRA) is left untouched.

Disable all clustering but misched-store-clustering for the veyron-v1 processor
using the new features.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Daniel Henrique Barboza (danielhb)

Changes

Hi,

This PR adds 4 new cluster options to fine tune how loads and stores are grouped during misched and postRA passes.

The work was motivated by macrofusion tests with the veyron-v1 processor model. This model has stored pair macrofusions but it doesn't do much with load pairs. Adding new options to turn off load clustering improved the macrofusions amount and performance.

The existing 2 cluster options added by @asb are kept because I'm afraid to break existing users that might be using them. If that' s not a concern we can remove them and keep the new options only.


Full diff: https://github.com/llvm/llvm-project/pull/149409.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+34-8)
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index b43b915d0ad4f..24ddd3e1fd1c7 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -99,11 +99,31 @@ static cl::opt<bool> EnableMISchedLoadStoreClustering(
     cl::desc("Enable load and store clustering in the machine scheduler"),
     cl::init(true));
 
+static cl::opt<bool> EnableMISchedLoadClustering(
+    "riscv-misched-load-clustering", cl::Hidden,
+    cl::desc("Enable load clustering in the machine scheduler"),
+    cl::init(true));
+
+static cl::opt<bool> EnableMISchedStoreClustering(
+    "riscv-misched-store-clustering", cl::Hidden,
+    cl::desc("Enable store clustering in the machine scheduler"),
+    cl::init(true));
+
 static cl::opt<bool> EnablePostMISchedLoadStoreClustering(
     "riscv-postmisched-load-store-clustering", cl::Hidden,
     cl::desc("Enable PostRA load and store clustering in the machine scheduler"),
     cl::init(true));
 
+static cl::opt<bool> EnablePostMISchedLoadClustering(
+    "riscv-postmisched-load-clustering", cl::Hidden,
+    cl::desc("Enable PostRA load clustering in the machine scheduler"),
+    cl::init(true));
+
+static cl::opt<bool> EnablePostMISchedStoreClustering(
+    "riscv-postmisched-store-clustering", cl::Hidden,
+    cl::desc("Enable PostRA store clustering in the machine scheduler"),
+    cl::init(true));
+
 static cl::opt<bool>
     EnableVLOptimizer("riscv-enable-vl-optimizer",
                       cl::desc("Enable the RISC-V VL Optimizer pass"),
@@ -301,10 +321,13 @@ ScheduleDAGInstrs *
 RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const {
   ScheduleDAGMILive *DAG = createSchedLive(C);
   if (EnableMISchedLoadStoreClustering) {
-    DAG->addMutation(createLoadClusterDAGMutation(
-        DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
-    DAG->addMutation(createStoreClusterDAGMutation(
-        DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+    if (EnableMISchedLoadClustering)
+      DAG->addMutation(createLoadClusterDAGMutation(
+          DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+
+    if (EnableMISchedStoreClustering)
+      DAG->addMutation(createStoreClusterDAGMutation(
+          DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
   }
 
   const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
@@ -318,10 +341,13 @@ ScheduleDAGInstrs *
 RISCVTargetMachine::createPostMachineScheduler(MachineSchedContext *C) const {
   ScheduleDAGMI *DAG = createSchedPostRA(C);
   if (EnablePostMISchedLoadStoreClustering) {
-    DAG->addMutation(createLoadClusterDAGMutation(
-        DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
-    DAG->addMutation(createStoreClusterDAGMutation(
-        DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+    if (EnablePostMISchedLoadClustering)
+      DAG->addMutation(createLoadClusterDAGMutation(
+          DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
+
+    if (EnablePostMISchedStoreClustering)
+      DAG->addMutation(createStoreClusterDAGMutation(
+          DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
   }
 
   return DAG;

@@ -301,10 +321,13 @@ ScheduleDAGInstrs *
RISCVTargetMachine::createMachineScheduler(MachineSchedContext *C) const {
ScheduleDAGMILive *DAG = createSchedLive(C);
if (EnableMISchedLoadStoreClustering) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though the old option is preserved, it now does nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All added options are defaulted to 'true', similar to the 2 existing options. But the older options have precedence. In the commit msg I gave an example:

"To preserve the existing behavior of the existing flags all options added
are default 'true'. Note that the two existing flags have precedence, so
setting "riscv-misched-load-store-clustering=false" will disable load
mutations even if setting riscv-misched-load-clustering=true. Same thing
with stores and with the postRA options."

So riscv-misched-load-store-clustering=false will turn off misched clustering for both loads and stores, regardless of riscv-misched-(load/store)-clustering being set to 'true' in the same command line.

Perhaps a better wording would be "disabling takes precedence" because that's how the code works, e.g.:

  • riscv-misched-load-store-clustering=false will always disable both load and store clustering for misched;
  • riscv-misched-load-clustering=false will always disable load clustering for misched;
  • riscv-misched-store-clustering=false will always disable store clustering for misched.

I thought about inverting the semantics of the flag, i.e. adding options that would disable steps instead of enabling, but that didn't make the situation much better. What would make it better is to remove the existing 2 options, but then we'll have to deal with potential broken scripts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. Thanks!

@mgudim mgudim requested a review from topperc July 18, 2025 14:53
Copy link
Contributor

@mgudim mgudim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's wait to see what others say if it is OK to remove the old options.

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally in favor of removing the existing two options (i.e. riscv-misched-load-store-clustering and riscv-postmisched-load-store-clustering). This is not a Clang flag and LLVM flags change all the time.

Non blocking, but an alternative solution to avoid having two flags could be making -riscv-misched-load-store-clustering (and its post-RA counterpart) to take a list of "load" or "store" instead to enable either or both clusterings.

@danielhb danielhb force-pushed the load_store_cluster_opts branch from 1576ca9 to 9de5329 Compare July 18, 2025 17:21
@danielhb
Copy link
Contributor Author

Thanks for the reviews. Removing the original flags is a cleaner way of going about it so I just did that.

@danielhb danielhb requested a review from mgudim July 18, 2025 17:24
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ minor comments

Please avoid force push unless necessary: https://llvm.org/docs/GitHub.html#rebasing-pull-requests-and-force-pushes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add new tests checking the situation where only either of the flags is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add two new tests disabling load and store clustering respectively

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Copy link
Contributor

@mgudim mgudim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

static cl::opt<bool> EnableMISchedLoadStoreClustering(
"riscv-misched-load-store-clustering", cl::Hidden,
cl::desc("Enable load and store clustering in the machine scheduler"),
static cl::opt<bool> EnableMISchedLoadClustering(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these come from subtarget instead of command line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I miss something? We don't have subtarget features for these?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my point. Should we add Subtarget features instead? The description of the PR suggests that whether this is profitable is CPU specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By Subtarget feature are we talking about a a feature, like FeatureUnalignedScalarMem, or a tune like TunePostRAScheduler (took the examples from sifive_p670)? Seems to me that this would be more like a tuning like PostRAScheduler.

Also, do we want to keep the default as is (enable all load/store clustering in both stages) and add features to disable it, or the other way around? Having everything disabled by default and adding subtarget features to enable what you want is cleaner, but we'll change behavior for everyone that doesn't add the new features in their target defs. Perhaps this is a good thing (we'll force people to make a decision instead of relying on defaults) ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked how TunePostRAScheduler is implemented and it has both a subtarget feature and also a flag (post-RA-scheduler) that can be used to override the target definition. I like this design because I give the flexibility of the command line (for debugging and whatnot) while also giving each processor a convenient way of setting clustering preferences.

If no one opposes I'll go this direction, but I guess we'll want to turn the default to 'false' and let each processor define what clustering they want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default settings of load/store clustering breaks 290+ tests in check-llvm. This shows that there's a LOT of code and logic built on top of the existing default. I don't want to touch this viper nest in this work so I'll keep the default as is. This means that the subtarget features will disable the default settings instead of enabling - TuneNoDefaultUnroll would be an example.

Also, after adding both the command line options and subtarget features I realized that I might be complicating things too much, given that we can override the subtarget features via "mattr=-" anyway, so I'll change the code to send just the subtarget features. I'll send a new version shortly.

@danielhb danielhb changed the title [RISCV] add load/store misched/PostRA cluster options [RISCV] add load/store misched/PostRA subtarget features Jul 24, 2025
@danielhb danielhb requested review from wangpc-pp and mshockwave July 24, 2025 13:57
@@ -1700,6 +1700,18 @@ def TuneNLogNVRGather
def TunePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
"UsePostRAScheduler", "true", "Schedule again after register allocation">;

def TuneDisableMISchedLoadClustering : SubtargetFeature<"disable-misched-load-clustering",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @topperc meant to mimic something like "bool RISCVSubtarget::useAA()" for example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant something we could set per-CPU like this.

@@ -1700,6 +1700,18 @@ def TuneNLogNVRGather
def TunePostRAScheduler : SubtargetFeature<"use-postra-scheduler",
"UsePostRAScheduler", "true", "Schedule again after register allocation">;

def TuneDisableMISchedLoadClustering : SubtargetFeature<"disable-misched-load-clustering",
"NoMISchedLoadClustering", "true", "Disable load clustering in the machine scheduler">;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"NoMISchedLoadClustering", "true", "Disable load clustering in the machine scheduler">;
"MISchedLoadClustering", "false", "Disable load clustering in the machine scheduler">;

Negate the enum and the value. See also TuneNoDefaultUnroll and TuneNoSinkSplatOperands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add Enable before the enum, like EnableMISchedLoadClustering. So that the auto-generated methods will be like enableXXX().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains why I didn't find an enableDefaultUnroll() method ...

@@ -150,6 +150,14 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {

bool enablePostRAScheduler() const override { return UsePostRAScheduler; }

bool enableMISchedLoadClustering() const { return MISchedLoadClustering; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need these because there are auto-generated methods.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielhb
Copy link
Contributor Author

danielhb commented Aug 6, 2025

Ping

Do we require more reviews or is this good to go? Also, I don't have write access (this is my first PR for LLVM) so I'm afraid someone will have to do the honors merging this one. I'm not sure if a rebase is needed due to the conflict being pointed out too ...

@danielhb
Copy link
Contributor Author

danielhb commented Aug 6, 2025

Updating the branch to fix conflicts

@danielhb danielhb force-pushed the load_store_cluster_opts branch from 8561859 to 170d0ae Compare August 6, 2025 13:25
@topperc
Copy link
Collaborator

topperc commented Aug 6, 2025

Ping

Do we require more reviews or is this good to go? Also, I don't have write access (this is my first PR for LLVM) so I'm afraid someone will have to do the honors merging this one. I'm not sure if a rebase is needed due to the conflict being pointed out too ...

You have enough reviews. The reviewers probably didn't think about you not having write permission.

Some processors benefit more from store clustering than load clustering,
and vice-versa, depending on factors that are exclusive to each one
(e.g. macrofusions implemented).

Likewise, certain optimizations benefits more from misched clustering
than postRA clustering. Macrofusions are again an example: in a
processor with store pair macrofusions, like the veyron-v1, it is
observed that misched clustering increases the amount of macrofusions
more than postRA clustering.  This of course isn't necessarily true for
other processors, but it shows that processors can benefit from a more
fine grained control of clustering mutations, and each one is able to do
it differently.

Add 4 new clustering options that deprecates the existing
riscv-misched-load-store-clustering and
riscv-postmisched-load-store-clustering options:

- riscv-misched-load-clustering and riscv-misched-store-clustering:
  enable/disable load/store clustering during misched;

- riscv-postmisched-load-clustering and
  riscv-postmisched-store-clustering: enable/disable load/store
  clustering during PostRA.

To preserve the existing clustering behavior all 4 options are defaulted to
'true'.
- add store-clustering=false to the existing LDCLUSTER test;
- add a new STCLUSTER test with load-clustering=false;
- add a new DEFAULTCLUSTER test with default clustering options.
Added 4 subtarget features that disables the default cluster settings:

- TuneDisableMISched(Load|Store)Clustering
- TuneDisablePostMISched(Load|Store)Clustering

The 4 command line options added previously were removed.

Use the new subtarget features in Ventana veyron-v1 to allow only
misched store clustering.

Test changes:

- add a store clustering test: misched-store-clustering.ll
- add subtarget features tests in misched-load-clustering.ll and
  misched-store-clustering.ll
- misched-mem-clustering.mir: use the subtarget features
Instead of using NoMISchedLoadClustering and setting it to 'true' using
the feature, rename it to NoMISchedLoadClustering and make the feature
set it to 'false'.

Rename the subtarget method disableMISchedLoadClustering() to
enableMISchedLoadClustering() to remove the negative check in
createMachineScheduler() when enabling the clustering.

Do the same with the other 3 subtarget features.
We can get the existing enableXXClustering methods to be
autogenerated by prefixing the enums with 'Enable', e.g.
'EnableMISchedLoadClustering', and an autogenerated
'enableMISchedLoadClustering' method will be created.
@danielhb danielhb force-pushed the load_store_cluster_opts branch from 170d0ae to a743f80 Compare August 6, 2025 15:30
@danielhb
Copy link
Contributor Author

danielhb commented Aug 6, 2025

Ping
Do we require more reviews or is this good to go? Also, I don't have write access (this is my first PR for LLVM) so I'm afraid someone will have to do the honors merging this one. I'm not sure if a rebase is needed due to the conflict being pointed out too ...

You have enough reviews. The reviewers probably didn't think about you not having write permission.

Got it. I asked @mgudim for help.

@mshockwave mshockwave merged commit 8e57689 into llvm:main Aug 6, 2025
5 checks passed
Copy link

github-actions bot commented Aug 6, 2025

@danielhb Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@mshockwave
Copy link
Member

Ping
Do we require more reviews or is this good to go? Also, I don't have write access (this is my first PR for LLVM) so I'm afraid someone will have to do the honors merging this one. I'm not sure if a rebase is needed due to the conflict being pointed out too ...

You have enough reviews. The reviewers probably didn't think about you not having write permission.

Got it. I asked @mgudim for help.

I just merged it on your behalf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants